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How Rings assign address space 

Stepl: aligiincotring address to self (to some pew er of 2) 
Step2: asagithe re silt to self address 

Step3: next_add: = self_addr + self_addr_space; //nunber of regster usedlocally 
Step4: serddowntiext_addr 

Example: 

Dim reeds 16adxbs 

Uartneeds4 self =32 
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9 



-d 



clkl 



clk2 



If the delay between "clkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 




compound B 



c lock runs with data 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring-interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
after layout. 
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6 



i? 



7*> 



- d 



c lka 

compound 



data_a 

T 



^j2^ clock opposes dnta 

this arrangement has the advantage 
of auto ensuring the no race 
condition (at least in this simple 



compound B 



data_b c ase) exists 



11 



clkb- 

clka 



Qa - 
data_a 



/ 



V 



V 



data_a which changes after clkb, 
which is later then clka, is sampled 
by clka. NO RACE. 



7 



clka 



Qal_ 



f 



90 
/ 



"V - 

7? 



OK \ 



S1% 




compound B 



compound A 
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^'3 



data 



clock 



clock 




clka 



clkb 
data. 




clock 



ncertainty range 



data_a leaving the bridge goes to member "b" and 
there should be sampled by rising of clkb. clkb 
lags a lot behind clka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines (~90). 
Adding latch works however if the delay between 
clka and clkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 
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data 




clock 




ncertainty range 



Here, data_b leaves member "b" to be sampled by 
clka in the bridge. But now clkb lags a lot behind 
clka. This actually works to our advantage, If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
members, we can take care to delay the datas 
beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. It will need a latch in "b". 



big module 



data from 
previous member 

elk 




ring interface. 



local clock lags behind 
nng_interface clock of this 
module, because we presume the 
module is big. fordata_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module "b", ring works 
fine. However if most of the traffic is from **c" to "b", 
this is more expensive in terms of latency. 



Another problem is "peak latency". Suppose that , "a" 
transmits mostly to "d" and "b" mostly to **c" In this case 
communication between "b" and "c" suffers degradation 
in case that peak traffic coincide. 




life, 
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Land bridge gets its name from the fact that it is 
a luxury. It spans across connected modules. 
The idea is simple. When V2 sends message to 
Dl it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at the other end of 
the bridge and gets fast to Dl . By same magic, 
message fromVl to D2 get bypassed also, 
message fromVl to Dl is treated directly. 




Enumeration is started by "Anchor" 
which assigns address^ 1 to itself, results 
of enumeration are labels 1 to 7. land 
bridge gets two addresses , as if it were not 
one module, there is "near" end, that got 
enumeration label "3", and the "far" end 
marked 6. 
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Bridge takes responcibility for strays, 
but only at the "far" end. During 
enumeration, bridge is "polarized" to 
have near and far end. Near is the end 
first struck by enumeration message. 



So we have exactly one enforcer for each 
ring. 



3 near 



f An&ior 





11 far 



In land bridge ring, the situation is trickier. If V2 
send message to address==5. The land bridge 
divert at 1 1/far end. it will re-appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markings. Messages at fdar end 
going through, are left alone. 



APPJD- 1006433 8 



Page 241 of 282 




APP_ID=10064338 



Page 242 of 282 



1 0 O-fii M-3 33 . O T 02,0 5 
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3 
1 

S 



V*9> 



7 

20 



64 data^lg 



ok 



scan_test 



clk(?) 



1 

g 



elk 



type 



ok 



idle j rnsgA 



X ™sgB 



\idk 



during the first clock, OK remains active, when type 
is of msgA. It means that on the next clock, 
member A may send new message, member A uses 
this ok to send msgB on the next clock. msgB gets 
stuck for a clock because OK goes inactive. It goes 
inactive because the fifo in memberB is full. One 
clock later, the fifo has a free entry, so OK returns to 
1 and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



imsg iok 



omsg ook 



5 




umsg 



dok -* 



uok 
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imsgiok omsg ook 




The incoming messages are examined first 
to see if it is supervisor or work/program. 
Work/program messages have address field. 
We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self-address 
register. This register gets its value during 
enumeration protocol. The lower part of this 
register is always masked,. Hopefully 
synthesis will delete the unused bits 
implementation. 



incoming 
address 

address 
split mask 




ours/through 



part of the address 
that enters the member 



Fi'<3 • 2-1 
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Member 



RifJL* RifL* Rif_o_* modu i eJ d 



RIF 



Address Space = 7 



Activation register 



R<3- "so 



300 



1 
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Member 



RifLo_type[7:0] 

Rif_o_addr[19:0] 

Rtf_o_datal/h[31:0] 




9. 3 2 



3O0 
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Member 



RifJL* Rif_* Rif_o_* module id 



RIF 



Address Space = 7 



Activation register 



300 
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the second land bridge solves most traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
perimiter. 



33 + 



35- 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



3^ 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 

3>*7 



Internal Memory 



3« 



Fast, Unified, Multi-port 



Network Processor 



Peripheral Expansion Area^* 

Knet, ATM, Uart, USB, Serials 



System 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
Ext. mem I/F 
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^1 



Internal Memory 



felch Unit 

FTU 



r 



1 ■und/Storr 

LSU 



Program Sctyuciitci 

PSU 



Vobla Core 



3*4 



PreirKid&tlutnp 

PBU 



Aritbmct»c 

DALU 



31* 



Register Fik 

5^ RFU 



5 



I . I. I . I. I. I 



I'tKr Debug 



Agcn.1 l/F 

AGI r-3?2. 



DMA Agent 



* Agent Bus 



xccrrul Wtrid 



Vobla Compound! 



5 

31 
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Other £ 




Internal memory 



♦ 



Peripheral 
Flfo 

data In 



External 
NP tank chain 




"The ayatanV* 



"External world" 



3? 



Internal memory 



Flfo 



t 



stream data out 



32 register!* of 
32 bit* per tolt^ 

A set of indkatnLttfe 
per task, whieh 
control task execution ' 
scheduling 

An inter tace to 
adjaee^ resources 

Fast memory accessed 
by load/rfore 
insiruetKms 



Oeneral 
purpose 
registers 



Special 
purpose 
reguaers 



L)oorbctl& 



Agent 
imerfacc 



PcxMvtlfiurarion 
rcfii stcrs 



E 



Internal 
menu*} 



External 
memory 



per task and 
global register* 



Initialised by 
the PP 



Big area accessed 
via a DMA interface 
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Rl register: 




eq - equals zero 
It - tau theaneg^tive 
gt - greater theitf positive 
c- carry 

mb - rcflcutiun of the RAM ftiuhi-reader busy indication 



H-3o 




l>222Z222222ll\ll I I I 1 I * J» 765 J J 2 I 0 




14 12 10 
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Frame structure 
of an example 
task type 



f li- 



r27. 



common 
task data 


task 

fragment 1 
data 






task fragment 
2 data 


task 

fragment 3 
data 


data of all lcvc)2 functions 


lc\ell n 
data 


level 1 12 
data 


level 1 H 
data 







size of IevelO 
frame part is 
different for 
each task 
type 

4 sizcoflcvcl2 
! frame part is 
* constant 

size of level I 
frame part is 
different per 
each task 
type 
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<2 



g> - Flip Flop 



_j> -Logic 
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_micrface_atWrC'4s( 1 5.0 
f_inlcTfacc_data[3 1 :0J 



type out[7;0](^etdL'RC ) 
addrlss_out[23:0] 



data^ut[63:0] 
(alsgjg CRC) 



ok2 drive 



memory data 



lJata[addo; 



message date 




vobla entry i j i i pi i s 1 i L 
in multf reader 1 

increment 1^ imi0 p j asl 

destination , , ... t.*^\\ 

address (°P 2 > '"P 1 ) (o P°> 



sourccaddr in srain address of destenation number t 

to transfer 



F '3 
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Vobla 



agent interface 



T 



mam 
entry 



shadow 
entry 



ft* 



output 
message 
encoder 



type out[7:0] 
address out[23:0] 
data^ut[63:0] 



tssst 



elk 



ok2drive 



agent command 



opcode 


options[Q-0] 


RA 


A1D{4:0] 


RB/imm8 | 



(op«on[6) =0) JkT 
mewgo_«ander3 ^o t0 p t i ons [s :0] j 



raw data [3 1:0] 



raw_addrcss[23.0] 1 typc[7:0J 



message data 



address_of_dcstcnalion message type 



agent command 


opeodc 


options[9:0] 


RA 


AID[4.-0]| RB 






lAiu=meesage_senaerj 
(ootiontei =1) ~ 














me»Mgo_»©nder6- 


{l,opuons{5:0] 


row_data[63:0] 


raw_addrcss[23:0] 


t 0000000 



message data 



address of destcnation message type 



6Z 




type_in[7:0] 



wnS iniciiacc_address{23 :0] 

\vnjc _intcrracc_data[3 1 .0] 
asc_addrcss[7 :0J 

type out[7:0l 



address_out[23 .0} 
Xtajftt[63:0J 



"A 
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agent command 
(AlD=dma agent] 



AIDf4 0; 




RB/imm8 




dma requea 

ft,ltry modify urgent dir autosend lone 

address ( oP2) (DPI > set ack address 
(OP9> {<)p0> (OP3)( oP10) 



sram addrcss[23 



dram address 



sram address 



number of bytes 
to transfer or 
address modifycr 



register 




input 


tile 




buffer 







tx data mux 



, on dmand 




elk 



Ft $ - 



Ft*. » 



agent command 
(AID=CRC agent) ' 



opcode 


options[9:0] 


RA 


AID[4:Q] | 






RA+U' 1 





O DATA[63:32] DATA[3 1 :0] 



RKSIDUE 



CRC type data size generaleoperation 
check mode 
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Vobla 



igent interfa ce 



n 



control 
register 



counter 



pre scalei 



div 

by 2 



time stamp 
register 



reset 



ft 4 



u 



agent command 



opcode 


options [9:0] 


RA 


AID 


RB/imm8 



tps[9:0] 
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mask control 




priority logic 


logic 







rifi wri te 
I nF i~ doo rbell cs 




, rif i res et 
nf 1 Slock 



agent commi 
(AID*doorbell 



and] op< 
all aflaTUT 



•pcodc 



options[9:0] 



RA 



AID[4:0] 



RB/immH 



1 2- 





BE 


51 [global task 


a 




J (register mask 



set/clear clear clear 
global request mask 
(OP2) (OP I ) COPO) 



{ OAO,0,0. mask bitjndcx [2:0] ) write mask 
or 

{0,0,0,0, l*req bii_inde\[2:0]} write request 
or 

I l,0.0,0,0,count_value[2:0]} writc counter 
or 

{ 0, 1 .0,0,0,0.0.0} write TGMR 
or 

{1,1 .0,0.0,0.0, urgcnt_valuc } writc urgent 



yo 
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rings input 



cycle for rings 



1%0 
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Memory 



Fl0j ■ ^ 



Trajan chip 



Memory 
port 
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A* 0 



CS 



cs 



Data 



Addr 



FIFO RD 



.FIFCMVR 



WR FIFO 



> r 



RD FIFO 



FPGA 



< ► 



Encryption 



< ► 



DSP 



-► PCI 



Host#l (PP) 
Host Wl (DMA) 



> Memory controller 



Ring 
interface 



DATA 



Message sender 



Write request 






Read request 


generator 




generator 


1 t 




1 t 


WR FIFO used 




RD FIFO used 


words counter 






words counter 



Hast #3 (Ring extender) 



Trajan 



FIFO„RD 
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Trajan memory 
imerfacc 



Trajan HFO_RD FIFO_Rp 
input 



Trajan FIFO_WR FIFQ__WR 
input 



DPR 



Host address 
generator 



RAM / FIFO 



Target address 
generator 

RAM /FIFO 



Synchronizer 



Synchronizer 



I— p Synchronizer 



Interrupt 



FPGA 



Interface to 
external 
device 
(device 
specific) 
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setup registers: 



ring 



doorbell, taskid and viscode 
urgent viscode 
header len and address 
threshold to urgent 



doorbells 




ring c 



^ free 


crc 


ovrn 


err 


last 


size 




v 


rx status word 
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mac 
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setu > 
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setup registers: 



doorbell, taskid and viscode 
urgent viscode 
send ahead address 
threshold to transmit 
threshold to urgent 



doorbell 
free entry count 
inished frames coui 



r ing in 



(ring^o 



ntrol 




ring c 
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fetch request j 


decode | 
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Data execution \ 
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